Spatial analysis is a process that begins with exploring and mapping a dataset and can lead to potentially complex models and visualizations of real world features and phenomena. Spatial queries are the building blocks of this analytical process. These queries are software operations that allow us to ask questions of our data and which return data metrics, subsets or new data objects. In this lesson we explore the two basic types of spatial queries: measurement queries and relationship queries.
Instructor Notes
The basic types of spatial queries are:
Both of these types of queries operate on the geometry of features in one or two datasets and are dependent on the type of geometry. For example, with point features you can make distance measurements or ask what points are spatially inside polygon objects. Polygon features, on the other hand, allow for a wider range of both measurement and spatial relationship queries.
An important distinction between these two types of queries is that measurement queries depend on the CRS of the data while spatial relationship queries do not. This is because topological relationships, the term used to describe spatial relationships, are invariant to rotation, translation and scaling transformations like those that CRS transformations entail.
We already know how to do attribute queries with our data. For example, we can select one or more specific counties by name or select those counties where the total population is greater than 100,000 because we have these columns in the dataset.
Spatial queries are special because they are dynamic. For example, we can compute area from the geometry without it already being encoded or we can select BART stations in Berkeley even if city is not encoded in the BART data by linking those two spatial datasets in the same geographic space. This dynamic query capability is extremely powerful!
In this lesson we’ll work through examples of each of those types of queries.
Then we’ll try a very common spatial analysis method that is a conceptual amalgam of those two types: proximity analysis.
Load the libraries we will use.
library(sf)
library(tmap)
Read in the CA census tracts data and then take a look at its geometry and attributes.
census_tracts = st_read("notebook_data/census/Tracts/cb_2018_06_tract_500k.shp")
## Reading layer `cb_2018_06_tract_500k' from data source `/Users/pattyf/Documents/Dlab/workshops/2021/Geospatial-Fundamentals-in-R-with-sf/notebook_data/census/Tracts/cb_2018_06_tract_500k.shp' using driver `ESRI Shapefile'
## Simple feature collection with 8041 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.4096 ymin: 32.53416 xmax: -114.1312 ymax: 42.00948
## Geodetic CRS: NAD83
plot(census_tracts$geometry)
head(census_tracts)
## Simple feature collection with 6 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -122.5009 ymin: 37.92535 xmax: -120.3345 ymax: 39.30875
## Geodetic CRS: NAD83
## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD
## 1 06 009 000300 1400000US06009000300 06009000300 3 CT
## 2 06 011 000300 1400000US06011000300 06011000300 3 CT
## 3 06 013 303102 1400000US06013303102 06013303102 3031.02 CT
## 4 06 013 303202 1400000US06013303202 06013303202 3032.02 CT
## 5 06 013 303203 1400000US06013303203 06013303203 3032.03 CT
## 6 06 013 307102 1400000US06013307102 06013307102 3071.02 CT
## ALAND AWATER geometry
## 1 457009794 394122 MULTIPOLYGON (((-120.764 38...
## 2 952744514 195376 MULTIPOLYGON (((-122.5001 3...
## 3 6507019 0 MULTIPOLYGON (((-121.7294 3...
## 4 3725528 0 MULTIPOLYGON (((-121.7235 3...
## 5 6354210 0 MULTIPOLYGON (((-121.7449 3...
## 6 1603153 0 MULTIPOLYGON (((-121.8226 3...
Select just the Alameda County census tracts.
census_tracts_ac = census_tracts[census_tracts$COUNTYFP=='001',]
plot(census_tracts_ac)
We’ll start off with some simple measurement queries.
We can get the area of each of our census tracts using thesf function st_area.
st_area(census_tracts_ac)[1:10]
## Units: [m^2]
## [1] 1162918.0 1000842.5 1664804.5 3178120.0 814375.4 1251118.1 1034057.0
## [8] 858325.5 1997754.4 1678617.1
Okay!
We got…
numbers!
…?
Question
Let’s take a look at our CRS.
st_crs(census_tracts_ac)
## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
Wow! We’re working with data that are in what is called an unprojected CRS. That means that the coordinates are latitude and longitude values and the units are decimal degrees. However, the sf::st_area function automatically returned area measurements in square meters (rather than in square degrees, which don’t make sense.)
How did it do this? Well, you can check out the help documentation for ?st_area for more information. If the data have a projected CRS, st_area uses Euclidean geometry to return area measurements in the units of the CRS. For an unprojected CRS, st_area calculates geodetic area on a curved surface model of the Earth and returns measurements in square meters. Pretty cool and pretty useful right?
That said, when doing spatial analysis, we will almost always want to convert all of our data to the same projected CRS since many spatial operations do not work with geographic CRSs.
Time to project! We’ll transform the data to the UTM Zone 10N, NAD83 CRS (EPSG 26910) which is appropriate for Northern California location data and is highly accurate for measurement queries for areas within the zone.
#Transform CRS of census tract data
census_tracts_ac_utm10 = st_transform(census_tracts_ac, 26910)
Now check it..especially look for the units.
st_crs(census_tracts_ac_utm10)
## Coordinate Reference System:
## User input: EPSG:26910
## wkt:
## PROJCRS["NAD83 / UTM zone 10N",
## BASEGEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]],
## CONVERSION["UTM zone 10N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-123,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["North America - 126°W to 120°W and NAD83 by country"],
## BBOX[30.54,-126,81.8,-119.99]],
## ID["EPSG",26910]]
Now let’s try our area calculation again.
st_area(census_tracts_ac_utm10)[1:10]
## Units: [m^2]
## [1] 1162095.2 1000143.8 1663700.2 3176020.4 813846.5 1250301.3 1033393.0
## [8] 857775.7 1996529.8 1677565.1
What if we compare areas calculated with our unprojected and projected CRS data?
# Using format to make for easier to read display
print(format(st_area(census_tracts_ac)[[1]], big.mark=','))
## [1] "1,162,918 [m^2]"
print(format(st_area(census_tracts_ac_utm10)[[1]], big.mark=","))
## [1] "1,162,095 [m^2]"
Hmmm… The numbers are a bit different…specifically…
format((st_area(census_tracts_ac)[[1]] - st_area(census_tracts_ac_utm10)[[1]]),digits=0, big.mark=',')
## [1] "823 [m^2]"
You may have noticed that our census tracts already have an area column in them.
Let’s also compare the calculated areas with the data in this column.
print(st_area(census_tracts_ac)[[1]])
## 1162918 [m^2]
print(st_area(census_tracts_ac_utm10)[[1]])
## 1162095 [m^2]
print(census_tracts$ALAND[1])
## [1] 457009794
Question
What explains the discrepancy? Which areas are correct? Which are incorrect?
We can also calculate the area for Alameda county summing the areas of all census tracts.
sum(st_area(census_tracts_ac_utm10))
## 1943203171 [m^2]
We can look up how large Alameda County is to check our work.The county is 739 miles2, which is around 1,914,001,213 meters2. I’d say we’re pretty close!
# Sum the area of all Alameda county Census Tracts dynamically in square miles...
sum(units::set_units(st_area(census_tracts_ac_utm10),mi^2))
## 750.2749 [mi^2]
Calculating the area of all features and adding the output to the dataframe is a useful operation because it allows us to convert count variables like population to densities.
# Add the area of all Alameda County Census tracts to the data frame
census_tracts_ac_utm10$area_sqmi <-units::set_units(st_area(census_tracts_ac_utm10), mi^2)
# Check it by summing
print(sum(census_tracts_ac_utm10$area_sqmi))
## 750.2749 [mi^2]
# Take a look
head(census_tracts_ac_utm10,3)
## Simple feature collection with 3 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 560194.2 ymin: 4174539 xmax: 575344.2 ymax: 4189173
## Projected CRS: NAD83 / UTM zone 10N
## STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME LSAD
## 27 06 001 425101 1400000US06001425101 06001425101 4251.01 CT
## 28 06 001 428600 1400000US06001428600 06001428600 4286 CT
## 29 06 001 432600 1400000US06001432600 06001432600 4326 CT
## ALAND AWATER geometry area_sqmi
## 27 590870 2045459 MULTIPOLYGON (((560340.9 41... 0.4486875 [mi^2]
## 28 898967 1080420 MULTIPOLYGON (((563419 4180... 0.3861577 [mi^2]
## 29 1673450 0 MULTIPOLYGON (((573362.1 41... 0.6423583 [mi^2]
You may be wondering how R is managing the units of our measurements.
It turns out that sf depends on the units package to track units.
This is super convenient! But there is a gotcha:
# convert to square kilometers
sum(st_area(census_tracts_ac_utm10)) / (1000^2)
## 1943.203 [m^2]
Oops! Our manual conversion to square kilometers gave us the right number but kept the now-wrong units!
Here’s the proper way to convert:
units::set_units(sum(st_area(census_tracts_ac_utm10)), km^2)
## 1943.203 [km^2]
Much nicer! In case you’re wondering how we knew the right abbreviation to use for kilometers, check out the leftmost column in this reference table:
# View(units::valid_udunits())
We can use the st_length operator in the same way to calculate the length or perimeter of features. Always take note of the output units!
st_length(census_tracts)[1:10]
## Units: [m]
## [1] 125204.900 159644.736 11948.424 9673.456 12629.443 6430.248
## [7] 10607.608 7853.356 6170.446 5183.838
The st_distance function can be used to find the pairwise distance between two sets of geometries.
st_distance(census_tracts_ac_utm10, census_tracts_ac_utm10)[1:5,1:5]
## Units: [m]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.000 6705.837 15753.864 17394.946 20876.861
## [2,] 6705.837 0.000 8984.300 10162.990 14244.463
## [3,] 15753.864 8984.300 0.000 0.000 3031.971
## [4,] 17394.946 10162.990 0.000 0.000 1135.945
## [5,] 20876.861 14244.463 3031.971 1135.945 0.000
You can also use it to find the distance between specific features.
# Identify my tracts of interest
mytracts = c('4201','4202','4203','4204')
# What is the distance between tract 4201 and all other tracts
st_distance(census_tracts_ac_utm10[census_tracts_ac_utm10$NAME=='4101',],
census_tracts_ac_utm10[census_tracts_ac_utm10$NAME %in% mytracts,] )
## Units: [m]
## [,1] [,2] [,3] [,4]
## [1,] 19263.81 19220.14 18822.72 18946.69
Spatial relationship queries consider how two geometries or sets of geometries relate to one another in space. For example, you may want to know what schools are located within the City of Berkeley or what East Bay Regional Parks have land within Berkeley. You may also want to combine a measurement query with a spatial relationship query. Example, you may want to know the total length of freeways within the city of Berkeley.
Here is a list of some of the more commonly used sf spatial relationship operations.
These can be used to select features in one dataset based on their spatial relationship to another. In other works, you can use these operations to make spatial selections / create spatial subsets.
Enough talk. Let’s work through some examples.
First, load the CA Places data and select the city of Berkeley and save it to a sf dataframe.
places = st_read('notebook_data/census/Places/cb_2018_06_place_500k.shp')
## Reading layer `cb_2018_06_place_500k' from data source `/Users/pattyf/Documents/Dlab/workshops/2021/Geospatial-Fundamentals-in-R-with-sf/notebook_data/census/Places/cb_2018_06_place_500k.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1521 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.2694 ymin: 32.53416 xmax: -114.229 ymax: 41.99314
## Geodetic CRS: NAD83
berkeley = places[places$NAME=='Berkeley',]
plot(berkeley$geometry)
Then, load the Alameda County schools data and make it a spatial dataframe.
schools_df = read.csv('notebook_data/alco_schools.csv')
schools_sf = st_as_sf(schools_df,
coords = c('X','Y'),
crs = 4326)
Check that the two sf dataframes have the same CRS.
st_crs(schools_sf) == st_crs(berkeley)
## [1] FALSE
They don’t have the same CRS so we need to align them. Let’s transform (or reproject) the CRS of both of these dataframes to UTM10N, NAD83 (EPSG 26910). This is a commonly used CRS for Northern CA data.
# Transform data CRSs...
schools_utm10 <- st_transform(schools_sf, 26910)
berkeley_utm10 <- st_transform(berkeley, 26910)
If you look at the Schools data you will see that it has a City column. So we can subset the data by attribute to select the Schools in Berkeley. No need to do a spatial selection.
berkeley_schools = schools_utm10[schools_utm10$City=='Berkeley',]
dim(berkeley_schools)
## [1] 31 8
Confirm the results by plotting the data
plot(berkeley_utm10$geometry)
plot(berkeley_schools$geometry, add=T)
That looks good and was a relatively simple operation. But what if the schools data didn’t have that city column or if only some of the rows had a value in that column. How can we identify the schools in Berkeley spatially?
Here’s how!
# SPATIALLY select only the schools within Berkeley
berkeley_schools_spatial = schools_utm10[berkeley_utm10, , op=st_intersects] #NO QUOTES!!!
Yes that was it! Take a long look at that simple yet powerful spatial selection syntax.
You should interpret that syntax as:
"Select all features (i.e. rows) in the schools_utm10 dataframe:
and all of the columns:
(all because the extraction brackets have no second argument)
whose geometry spatially intersects the Berkeley_utm10 geometry
The op=st_intersects argument is optional because st_intersects is the default spatial selector.
To emphasize this, let’s rerun the last command
# SPATIALLY select only the schools within Berkeley
berkeley_schools_spatial = schools_utm10[berkeley_utm10, ]
spatiallly intersects mean?Here’s one way to explain it.
Geometry A spatially intersects Geometry B if any of its parts (e.g., a point, line segment, or polygon) is equivalent to, touches, crosses, is contained by, contains, or overlaps any part of Geometry B.
So st_intersects is the mother of all spatial relationships! It is the most general and the most useful. However, you can specify any of those more specific spatial relationships by setting op= to any of the options listed in the ?st_intersects? help documentation.
Let’s check out the sf object that our selection returned.
# How many schools did we get
dim(berkeley_schools_spatial)
## [1] 32 8
# Map the results
plot(berkeley_utm10$geometry)
plot(berkeley_schools_spatial$geometry, add=T)
Interestingly, we have one more school in Berkeley based on the spatial selection!? Let’s take a look.
plot(berkeley_utm10$geometry)
plot(berkeley_schools_spatial$geometry, add=T)
plot(berkeley_schools$geometry,col="red", add=T)
Let’s use an interactive tmap to zoom into the school that was not selected by attribute but was spatially selected.
tmap_mode('view')
## tmap mode set to interactive viewing
tm_shape(berkeley_utm10) +
tm_borders() +
tm_shape(berkeley_schools_spatial) +
tm_dots(col="black", size=.3) +
tm_shape(berkeley_schools) +
tm_dots(col="red", size=.1)
IMPORTANT: The default spatial selection operator is
st_intersects. If you want to use any other operator, it must be specified.
For example, we can use the st_disjoint operator to select only those schools NOT in Berkeley.
# Select all Alameda County Schools NOT in Berkeley with the disjoint operator
berkeley_schools_disjoint = schools_utm10[berkeley_utm10, , op=st_disjoint]
# Plot the result
plot(berkeley_schools_disjoint$geometry)
plot(berkeley_utm10, col=NA, border="red", add=T)
## Warning in plot.sf(berkeley_utm10, col = NA, border = "red", add = T): ignoring
## all but the first attribute
There is no need to memorize these spatial operators (aka predicates)! Here is a fantastic sf cheatsheet that lists and briefly explains all these common functions (and many more).
Let’s load a new dataset, the CA Protected Areas Database (CPAD), to demonstrate these spatial queries in more detail.
This dataset contains all of the protected areas (parks and the like) in California.
We will use this data and the Alameda County Census Tract Data that we created earlier to ask “What protected areas are within Alameda County?”
First load the CPAD data.
cpad = st_read('./notebook_data/protected_areas/CPAD_2020a_Units.shp')
## Reading layer `CPAD_2020a_Units' from data source `/Users/pattyf/Documents/Dlab/workshops/2021/Geospatial-Fundamentals-in-R-with-sf/notebook_data/protected_areas/CPAD_2020a_Units.shp' using driver `ESRI Shapefile'
## Simple feature collection with 17068 features and 21 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -374984.2 ymin: -604454.8 xmax: 540016.3 ymax: 449743.2
## Projected CRS: NAD83 / California Albers
What is the CRS of the CPAD data?
Let’s transform the data to match census_tracts_ac_utm10.
cpad_utm10 = st_transform(cpad, st_crs(census_tracts_ac_utm10))
Let’s plot the data in so that we know what to expect. CPAD is big so wait for it…
plot(census_tracts_ac_utm10$geometry, col='grey', border="grey")
plot(cpad_utm10$geometry, col='green', add=T)
We can see from our map that some of the protected areas are completely within Alameda County, some of them overlap, and some are completely outside of the county. To get both of the “inside” and “overlaps” cases we use the st_intersects spatial selection operator, which is the default. Let’s check it out.
cpad_intersects = cpad_utm10[census_tracts_ac_utm10,] #st_intersects
cpad_within = cpad_utm10[census_tracts_ac_utm10, , op=st_within] #st_within
We can use tmap to explore the difference in the results from st_intersects vs st_within
tmap_mode('view')
## tmap mode set to interactive viewing
tm_shape(census_tracts_ac_utm10)+
tm_polygons(col="gray", border.col="grey") +
tm_shape(cpad_intersects) +
tm_borders(col="green") +
tm_shape(cpad_within) +
tm_borders(col='red')
What you can see from the above, is that by default, st_intersects returns the features that intersect but it does not clip the features to the boundary of Alameda County. For that, we would need to use a different spatial operation - st_intersection.
Let’s try it!
cpad_in_ac = st_intersection(cpad_utm10, census_tracts_ac_utm10)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
Great! Now, if we scroll the resulting sf object we’ll see that the COUNTY column of our resulting subset gives us a good sanity check on our results. Or does it?
table(cpad_in_ac$COUNTY)
##
## Alameda Contra Costa San Joaquin Santa Clara
## 836 20 2 4
Always check your output - both the attribute table & the geometry!
head(cpad_in_ac)
## Simple feature collection with 6 features and 31 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 560241 ymin: 4187290 xmax: 562089.7 ymax: 4189117
## Projected CRS: NAD83 / UTM zone 10N
## ACCESS_TYP UNIT_ID UNIT_NAME
## 9474 Open Access 15869 McLaughlin Eastshore State Park
## 10131 Open Access 17626 Marina Park
## 11568 Open Access 23663 Davenport Park
## 12466 Restricted Access 28130 McLaughlin Eastshore State Park
## 12997 Restricted Access 46940 Emeryville Crescent State Marine Reserve
## 15136 Open Access 46002 Point Emery Park
## SUID_NMA AGNCY_ID AGNCY_NAME AGNCY_LEV
## 9474 29199 204 California Department of Parks and Recreation State
## 10131 21833 1098 Emeryville, City of City
## 11568 17939 1098 Emeryville, City of City
## 12466 29198 204 California Department of Parks and Recreation State
## 12997 30564 204 California Department of Parks and Recreation State
## 15136 29208 1098 Emeryville, City of City
## AGNCY_TYP AGNCY_WEB
## 9474 State Agency http://www.parks.ca.gov/
## 10131 City Agency http://www.ci.emeryville.ca.us/158/City-Parks
## 11568 City Agency http://www.ci.emeryville.ca.us/158/City-Parks
## 12466 State Agency http://www.parks.ca.gov/
## 12997 State Agency http://www.parks.ca.gov/
## 15136 City Agency http://www.ci.emeryville.ca.us/158/City-Parks
## LAYER MNG_AG_ID
## 9474 California Department of Parks and Recreation 2032
## 10131 City 1098
## 11568 City 1098
## 12466 California Department of Parks and Recreation 2032
## 12997 California Department of Parks and Recreation 2032
## 15136 City 2032
## MNG_AGENCY MNG_AG_LEV
## 9474 East Bay Regional Park District Special District
## 10131 Emeryville, City of City
## 11568 Emeryville, City of City
## 12466 East Bay Regional Park District Special District
## 12997 East Bay Regional Park District Special District
## 15136 East Bay Regional Park District Special District
## MNG_AG_TYP PARK_URL COUNTY
## 9474 Recreation/Parks District http://www.ebparks.org/parks/eastshore Alameda
## 10131 City Agency <NA> Alameda
## 11568 City Agency <NA> Alameda
## 12466 Recreation/Parks District http://www.ebparks.org/parks/eastshore Alameda
## 12997 Recreation/Parks District <NA> Alameda
## 15136 Recreation/Parks District http://www.ebarks.org Alameda
## ACRES LABEL_NAME YR_EST DES_TP
## 9474 1084.173 McLaughlin Eastshore SP 0 <NA>
## 10131 10.322 Marina Park 0 Local Park
## 11568 2.102 Davenport Park 0 Local Park
## 12466 456.308 McLaughlin Eastshore SP NA <NA>
## 12997 58.981 Emeryville Crescent State Marine Reserve 1985 <NA>
## 15136 20.687 Point Emery Park 0 Local Park
## GAP_STS STATEFP COUNTYFP TRACTCE AFFGEOID GEOID NAME
## 9474 <NA> 06 001 425101 1400000US06001425101 06001425101 4251.01
## 10131 4 06 001 425101 1400000US06001425101 06001425101 4251.01
## 11568 4 06 001 425101 1400000US06001425101 06001425101 4251.01
## 12466 <NA> 06 001 425101 1400000US06001425101 06001425101 4251.01
## 12997 <NA> 06 001 425101 1400000US06001425101 06001425101 4251.01
## 15136 4 06 001 425101 1400000US06001425101 06001425101 4251.01
## LSAD ALAND AWATER area_sqmi geometry
## 9474 CT 590870 2045459 0.4486875 [mi^2] MULTIPOLYGON (((561544 4188...
## 10131 CT 590870 2045459 0.4486875 [mi^2] MULTIPOLYGON (((560255.1 41...
## 11568 CT 590870 2045459 0.4486875 [mi^2] POLYGON ((560870.4 4188045,...
## 12466 CT 590870 2045459 0.4486875 [mi^2] MULTIPOLYGON (((560376.8 41...
## 12997 CT 590870 2045459 0.4486875 [mi^2] POLYGON ((561322.1 4187901,...
## 15136 CT 590870 2045459 0.4486875 [mi^2] POLYGON ((561506.7 4189117,...
Let’s also use an overlay plot to check the output geometry.
tm_shape(census_tracts_ac_utm10) +
tm_polygons(col='gray', border.col='gray') +
tm_shape(cpad_in_ac) +
tm_polygons(col = 'ACRES', palette = 'YlGn',
border.col = 'black', lwd = 0.4,
alpha = 0.8,
title = 'Protected areas in Alameda County, colored by area')
It really depends! But make sure you understand the difference.
st_intersects is a logical operator that returns True if two geometries intersect in any way. When used to subset (or filter) a spatial dataframe, st_intersects returns those features in the dataframe that intersect with the filter dataframe.
On the other hand, st_intersection returns a new spatial dataframe that set intersection of the two dataframes, including both the geometries and attributes of the intersecting features. Use st_intersection with caution and always check your results.
It’s your turn.
Write a spatial relationship query to create a new dataset containing only the BART stations in Berkeley.
Run the next two cells to (1) load the dataset containing Berkeley BART stations and then reproject it to the same CRS as that used by the Berkeley_utm10 dataframe (EPSG: 26910)’ (2) plot these two datasets in an overlay map.
Then, write your own code to: 1. Spatially select the BART stations that are within Berkeley 2. Plot the Berkeley boundary and then overlay the selected BART stations.
# load the Berkeley boundary
bart_df = st_read("notebook_data/transportation/bart.csv")
## Reading layer `bart' from data source `/Users/pattyf/Documents/Dlab/workshops/2021/Geospatial-Fundamentals-in-R-with-sf/notebook_data/transportation/bart.csv' using driver `CSV'
## Warning: no simple feature geometries present: returning a data.frame or tbl_df
bart_sf = st_as_sf(bart_df,
coords = c('lon','lat'),
crs = 4326)
# transform to EPSG:26910
bart_utm10 = st_transform(bart_sf, st_crs(berkeley_utm10))
# display
head(berkeley_utm10)
## Simple feature collection with 1 feature and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 559347.6 ymin: 4188961 xmax: 567371.4 ymax: 4195617
## Projected CRS: NAD83 / UTM zone 10N
## STATEFP PLACEFP PLACENS AFFGEOID GEOID NAME LSAD ALAND
## 1213 06 06000 02409837 1600000US0606000 0606000 Berkeley 25 27127391
## AWATER geometry
## 1213 18715614 MULTIPOLYGON (((559347.6 41...
Plot the data together
plot(bart_utm10$geometry)
plot(berkeley_utm10$geometry, border='blue', add=T)
Now, in the cell below, write the code to spatially select the Berkeley BART stations, then make the map.
# YOUR CODE HERE:
# Spatially select the BART stations within Berkeley
# Plot the Bart stations in Berkeley overlaid on top of the Berkeley City Boundary
Now that we’ve seen the basic idea of spatial measurement and spatial relationship queries, let’s take a look at a common analysis that combines those concepts: promximity analysis.
Proximity analysis seeks to identify near features - or, in other words, all features in a focal feature set that are within some maximum distance of features in a reference feature set.
A very common workflow for this analysis is:
Buffer around the features in the reference dataset to create buffer polygons.
Run a spatial relationship query to find all focal features that intersect (or are within) the buffer polygons.
Let’s read in our bike boulevard data again.
Then we’ll find out which of our Berkeley schools are within a block’s distance (200 meters) of the bike boulevards.
bike_blvds = st_read('notebook_data/transportation/BerkeleyBikeBlvds.geojson')
## Reading layer `BerkeleyBikeBlvds' from data source `/Users/pattyf/Documents/Dlab/workshops/2021/Geospatial-Fundamentals-in-R-with-sf/notebook_data/transportation/BerkeleyBikeBlvds.geojson' using driver `GeoJSON'
## Simple feature collection with 211 features and 10 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: 561541.2 ymin: 4189007 xmax: 566451.6 ymax: 4193483
## Projected CRS: WGS 84 / UTM zone 10N
plot(bike_blvds$geometry)
Of course, we need to reproject the boulevards to our projected CRS.
bike_blvds_utm10 = st_transform(bike_blvds, st_crs(berkeley_utm10))
Now we can create our 200 meter bike boulevard buffers.
bike_blvds_buf = st_buffer(bike_blvds_utm10, dist=200)
Now let’s overlay everything.
tm_shape(berkeley_utm10) +
tm_polygons(col = 'lightgrey') +
tm_shape(bike_blvds_buf) +
tm_polygons(col = 'pink', alpha = 0.5) +
tm_shape(bike_blvds_utm10) +
tm_lines() +
tm_shape(berkeley_schools_spatial) +
tm_dots(col = 'purple', size=0.2)
Great! Looks like we’re all ready to run our spatial relationship query to complete the proximity analysis. At this point (pun intended) select the schools that are in within the bike boulevard buffer polygons.
schools_near_blvds = berkeley_schools_spatial[bike_blvds_buf,]
# or
#schools_near_blvds = berkeley_schools_spatial[bike_blvds_buf, , op='st_within']
Now let’s overlay again, to see if the schools we selected make sense.
tm_shape(berkeley_utm10) +
tm_polygons(col = 'lightgrey') +
# add the bike blvd buffer polygons
tm_shape(bike_blvds_buf) +
tm_polygons(col = 'pink', alpha = 0.5) +
# Add the bike blvd line features
tm_shape(bike_blvds_utm10) +
tm_lines() +
# Add all berkeley schools
tm_shape(berkeley_schools_spatial) +
tm_dots(col = 'purple', size=0.2) +
# Add schools near bike blvds in yellow
tm_shape(schools_near_blvds) +
tm_dots(col = 'yellow', size=0.2)
You can use st_distance and its companion function st_nearest_feature to compute the distance between each feature and the nearest bike boulevard. The st_nearest_feature function returns the ID of the closest feature.
# Identify the nearest bike boulevard for each school
nearest = st_nearest_feature(berkeley_schools_spatial,bike_blvds_utm10)
# take a look!
nearest
## [1] 71 171 39 136 42 30 163 172 127 171 189 156 137 33 187 197 50 207 169
## [20] 102 125 137 3 109 207 76 135 173 56 102 146 76
Then we can calculate the distance between each school and it’s nearest bike boulevard.
st_distance(berkeley_schools_spatial, bike_blvds_utm10[nearest,], by_element=TRUE)
## Units: [m]
## [1] 13.848161 985.459488 309.889446 369.402946 196.011379 15.332395
## [7] 27.250406 439.758905 107.902846 926.216792 193.030072 181.836256
## [13] 373.736477 215.903128 184.321307 186.446907 1.288383 94.064527
## [19] 211.477566 218.613628 186.913116 230.212129 15.162313 188.829602
## [25] 232.764113 224.700672 173.920971 15.892361 514.765384 92.556921
## [31] 93.426741 128.131187
Now it’s your turn to try out a proximity analysis!
Write your own code to find all BERKELEY schools within walking distance (1 km) of a BART station.
As a reminder, let’s break this into steps:
bart_utm10 dataframeTo see the solution, look at the hidden text below.
# YOUR CODE HERE:
# Spatially select the Berkeley Bart Stations
# You may have done this in a previous exercise.
# buffer the BART stations to 1 km
# spatially select the schools within the buffers
# Map your results with tmap
# plot the Berkeley boundary (for reference) in lightgrey
# add the BART stations (for reference) to the plot in green
# add the BART buffers (for check) in lightgreen
# add all Berkeley schools (for reference) in black
# add the schools near BART (for check) in yellow
Compute the distance between each Berkeley School and its nearest BART station!
#YOUR CODE HERE
Leveraging what we’ve learned in our earlier lessons, we got to work with map overlays and start answering questions related to proximity. Key concepts include:
st_area,st_lengthst_distancest_intersects,st_intersectionst_within, etc.st_buffer